Method and apparatus for accessing data of multi-tile encoded picture stored in buffering apparatus

ABSTRACT

A method for read pointer maintenance of a buffering apparatus, which is arranged to buffer data of a multi-tile encoded picture having a plurality of tiles included therein, includes the following steps: judging if decoding of a first tile of the multi-tile encoded picture encounters a tile boundary of the first tile; and when it is judged that the tile boundary of the first tile is encountered, storing a currently used read pointer into a pointer buffer, and loading a selected read pointer from the pointer buffer to act as the currently used read pointer.

CROSS REFERENCE TO RELATED APPLICATIONS

This is a continuation of U.S. patent application Ser. No. 13/681,426(filed on Nov. 20, 2012), which is a continuation-in-part of U.S. patentapplication Ser. No. 13/304,372 (filed on Nov. 24, 2011) and furtherclaims the benefit of U.S. provisional application No. 61/566,984 (filedon Dec. 5, 2011), where U.S. patent application Ser. No. 13/304,372(filed on Nov. 24, 2011) claims the benefit of U.S. provisionalapplication No. 61/433,272 (filed on Jan. 17, 2011). The entire contentsof the related applications are incorporated herein by reference.

BACKGROUND

The disclosed embodiments of the present invention relate to video/imageprocessing, and more particularly, to a method and apparatus foraccessing data of a multi-tile encoded picture stored in a bufferingapparatus.

VP8 is an open video compression format released by Google®. Like manymodern video compression schemes, VP8 is based on decomposition offrames into square subblocks of pixels, prediction of such subblocksusing previously constructed blocks, and adjustment of such predictions(as well as synthesis of unpredicted blocks) using a discrete cosinetransform (DCT). In one special case, however, VP8 uses a Walsh-Hadamardtransform (WHT) instead of the commonly used DCT.

WebP is an image format developed by Google® according to VP8.Specifically, WebP is based on VP8's intra-frame coding and uses acontainer based on resource interchange file format (RIFF). Besides,WebP is announced to be a new open specification that provides lossycompression for photographic images. Ina large scale study of 900,000web images, WebP images are found 39.8% smaller than Joint PhotographicExperts Group (JPEG) images of similar quality. Webmasters, webdevelopers and browser developers therefore can use the WebP format tocreate smaller, better looking images that can help to improve user'sweb surfing.

In accordance with the VP8/WebP specification, the input to a VP8/WebPdecoder is a sequence of compressed frames whose order matches theirorder in time. Besides, every compressed frame has multiple partitionsincluded therein. As the VP8/WebP bitstream is configured to transmitcompressed frames each having a plurality of partitions includedtherein, how to efficiently buffer and decode each compressed frame of amulti-partition VP8/WebP bitstream becomes an important issue in thistechnical field.

As proposed in High-Efficiency Video Coding (HEVC) specification, onepicture can be partitioned into multiple tiles. FIG. 19 is a diagramillustrating tiles adopted in the HEVC specification. FIG. 20 is adiagram illustrating a conventional decoding order of the tiles shown inFIG. 19. As shown in FIG. 19, one picture 10 is partitioned into aplurality of tiles T₁₁′-T₁₃′, T₂₁′-T₂₃′, T₃₁′-T₃₃′ separated by rowboundaries (i.e., horizontal boundaries) HB₁′, HB₂′ and columnboundaries (i.e., vertical boundaries) VB₁′, VB₂′. Inside each tile,largest coding units (LCUs)/treeblocks (TBs) are raster scanned, asshown in FIG. 20. For example, LCUs/TBs orderly indexed by the Arabicnumbers in the same tile T₁₁ are decoded sequentially. Inside eachmulti-tile picture, tiles are raster scanned, as shown in FIG. 20. Forexample, the tiles T₁₁′-T₁₃′, T₂₁-T₂₃′ and T₃₁-T₃₃′ are decodedsequentially. Specifically, one picture can be uniformly partitioned bytiles or partitioned into specified LCU-column-row tiles. A tile is apartition which has vertical and horizontal boundaries, and it is alwaysrectangular with an integer number of LCUs/TBs included therein.

In accordance with HEVC specification, there are two types of tiles,independent tiles and dependent tiles. As to the independent tiles, theyare treated as sub-pictures/sub-streams. Hence, encoding/decodingLCUs/TBs of an independent tile (e.g., motion vector prediction, intraprediction, deblocking filter (DF), sample adaptive offset (SAO),adaptive loop filter (ALF), entropy coding, etc.) does not need datafrom other tiles. Besides, assume that data of the LCUs/TBs isencoded/decoded using arithmetic coding such as a context-based adaptivebinary arithmetic coding (CABAC) algorithm. Regarding each independenttile, the CABAC statistics are initialized/re-initialized at the startof the tile, and the LCUs outside the tile boundaries of the tile areregarded as unavailable.

For example, the CABAC statistics at the first LCU/TB indexed by “1” inthe tile T₁₁′ would be initialized when decoding of the tile T₁₁′ isstarted, the CABAC statistics at the first LCU/TB indexed by “13” in thetile T₁₂′ would be re-initialized when decoding of the tile T₁₂′ isstarted, the CABAC statistics at the first LCU/TB indexed by “31” in thetile T₁₃′ would be re-initialized when decoding of the tile T₁₃′ isstarted, and the CABAC statistics at the first LCU/TB indexed by “40” inthe tile T₂₁′ would be re-initialized when decoding of the tile T₂₁′ isstarted.

However, encoding/decoding LCUs/TBs of a dependent tile (e.g., motionvector prediction, intra prediction, DF, SAO, ALF, entropy coding, etc.)has to consider data provided by other tiles. Hence, vertical andhorizontal buffers are required for successfully decoding a multi-tileencoded picture/compressed frame having dependent tiles includedtherein. Specifically, the vertical buffer is used for buffering decodedinformation of LCUs/TBs of an adjacent tile beside a vertical boundary(e.g., a left vertical boundary) of a currently decoded tile, and thehorizontal buffer is used for buffering decoded information of LCUs/TBsof another adjacent tile beside a horizontal boundary (e.g., a tophorizontal boundary) of the currently decoded tile. As a result, thebuffer size for decoding the multi-tile encoded picture/compressed framewould be large, leading to higher production cost. Besides, assume thatdata of the LCUs/TBs is encoded/decoded using arithmetic coding such asa CABAC algorithm. Regarding a dependent tile, the CABAC statistics maybe initialized at the start of the tile or inherited from another tile.For example, the CABAC statistics at the first LCU/TB indexed by “1” inthe tile T₁₁′ would be initialized when decoding of the tile T₁₁′ isstarted, the CABAC statistics at the first LCU/TB indexed by “13” in thetile T₁₂′ would be inherited from the CABAC statistics at the lastLCU/TB indexed by “12” in the tile T₁₁′ when decoding of the tile T₁₂′is started, the CABAC statistics at the first LCU/TB indexed by “31” inthe tile T₁₃′ would be inherited from the CABAC statistics at the lastLCU/TB indexed by “30” in the tile T₁₂′ when decoding of the tile T₁₃′is started, and the CABAC statistics at the first LCU/TB indexed by “40”in the tile T₂₁′ would be inherited from the CABAC statistics at thelast LCU/TB indexed by “39” in the tile T₁₃′ when decoding of the tileT₂₁′ is started.

Regarding the Joint Photographic Experts Group extended range (JPEG-XR)specification, one picture can be partitioned into specifiedMacroblock-column-row tiles. A tile is a partition which has verticaland horizontal boundaries, and it is always rectangular with an integernumber of macroblocks (MBs) included therein. Inside each tile, MBs areraster scanned. Inside each multi-tile picture, tiles are rasterscanned. In accordance with JPEG-XR specification, there are two typesof tiles, hard tiles and soft tiles. As to the hard tiles, they aretreated as sub-pictures. Hence, encoding/decoding MBs of a hard tiledoes not need data from other tiles. However, encoding/decoding MBs of asoft tile has to consider data provided by other tiles. For example, insoft tiles, overlap filtering may be applied across tile boundaries.

As the multi-tile HEVC/JPEG-XR bitstream is configured to transmitencoded/compressed frames each having a plurality of tiles includedtherein, how to efficiently buffer and decode each encoded/compressedframe of the multi-tile HEVC/JPEG-XR bitstream becomes an importantissue in this technical field.

SUMMARY

In accordance with exemplary embodiments of the present invention, amethod and apparatus for accessing data of a multi-tile encoded picturein a buffering apparatus are proposed to solve the above-mentionedproblem.

According to a first aspect of the present invention, an exemplarymethod for read pointer maintenance of a buffering apparatus isdisclosed. The buffering apparatus is arranged to buffer data of amulti-tile encoded picture having a plurality of tiles included therein.The exemplary method includes: judging if decoding of a first tile ofthe multi-tile encoded picture encounters a tile boundary of the firsttile; and when it is judged that the tile boundary of the first tile isencountered, storing a currently used read pointer into a pointerbuffer, and loading a selected read pointer from the pointer buffer toact as the currently used read pointer.

According to a second aspect of the present invention, an exemplarybuffer controller for read pointer maintenance of a buffering apparatusis disclosed. The exemplary buffering apparatus is arranged to bufferdata of at least a multi-tile encoded picture having a plurality oftiles included therein. The exemplary buffer controller includes ajudging unit and a control unit. The judging unit is arranged forjudging if decoding of a first tile of the multi-tile encoded pictureencounters a tile boundary of the first tile. The control unit isarranged for storing a currently used read pointer into a pointer bufferand loading a selected read pointer from the pointer buffer to act asthe currently used read pointer when the judging unit judges that thetile boundary is encountered.

According to a third aspect of the present invention, an exemplarybuffering apparatus for buffering data of at least a multi-tile encodedpicture having a plurality of tiles included therein, is disclosed. Theexemplary buffering apparatus includes a first storage space and asecond storage space. The first storage space is arranged to buffer afirst tile of the multi-tile encoded picture. The second storage spaceis arranged to buffer a second tile of the multi-tile encoded picture.The first tile is currently decoded, and the second tile is notcurrently decoded.

These and other objectives of the present invention will no doubt becomeobvious to those of ordinary skill in the art after reading thefollowing detailed description of the preferred embodiment that isillustrated in the various figures and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating multiple partitions in a compressedframe to be processed by a proposed buffering apparatus of the presentinvention.

FIG. 2 is a diagram illustrating how transform coefficients in acompressed frame are packed into four partitions.

FIG. 3 is a diagram illustrating a video/image decoding system accordingto a first exemplary embodiment of the present invention.

FIG. 4 is a diagram illustrating a video/image decoding system accordingto a second exemplary embodiment of the present invention.

FIG. 5 is a diagram illustrating an alternative design of a bufferingapparatus.

FIG. 6 is a diagram illustrating a video/image decoding system accordingto a third exemplary embodiment of the present invention.

FIG. 7 is a diagram illustrating a video/image decoding system accordingto a fourth exemplary embodiment of the present invention.

FIG. 8 is a diagram illustrating a video/image decoding system accordingto a fifth exemplary embodiment of the present invention.

FIG. 9 is a diagram illustrating an exemplary entropy decoding operationperformed by the entropy decoder shown in FIG. 8.

FIG. 10 is a diagram illustrating a buffer controller according to anembodiment of the present invention.

FIG. 11 is a diagram illustrating a sketch map of a multi-tilevideo/image bitstream according to an embodiment of the presentinvention.

FIG. 12 is a diagram illustrating a plurality of tiles each including aplurality of slices according to an embodiment of the present invention.

FIG. 13 is a diagram illustrating another sketch map of the multi-tilevideo/image bitstream according to an embodiment of the presentinvention.

FIG. 14 is a diagram illustrating an exemplary read pointer maintenanceoperation of the buffering apparatus.

FIG. 15 is a diagram illustrating a storage device according to a firstembodiment of the present invention.

FIG. 16 is a diagram illustrating a storage device according to a secondembodiment of the present invention.

FIG. 17 is a diagram illustrating a storage device according to a thirdembodiment of the present invention.

FIG. 18 is a diagram illustrating a storage device according to a fourthembodiment of the present invention.

FIG. 19 is a diagram illustrating tiles adopted in the HEVCspecification.

FIG. 20 is a diagram illustrating a conventional decoding order of thetiles shown in FIG. 19.

DETAILED DESCRIPTION

Certain terms are used throughout the description and following claimsto refer to particular components. As one skilled in the art willappreciate, manufacturers may refer to a component by different names.This document does not intend to distinguish between components thatdiffer in name but not function. In the following description and in theclaims, the terms “include” and “comprise” are used in an open-endedfashion, and thus should be interpreted to mean “include, but notlimited to . . . ”. Also, the term “couple” is intended to mean eitheran indirect or direct electrical connection. Accordingly, if one deviceis electrically connected to another device, that connection may bethrough a direct electrical connection, or through an indirectelectrical connection via other devices and connections.

In accordance with the VP8/WebP specification, the input to a VP8/WebPdecoder is a sequence of compressed frames each having 2-9 partitions.These partitions begin and end on byte boundaries. The leading partitionof a compressed frame (i.e., the partition that is transmitted first)has two subsections: header information that applies to the compressedframe as a whole and per-macroblock prediction information that includesprediction information of each macroblock in the compressed frame. Theremaining partitions (1, 2, 4 or 8) contain transform coefficients(e.g., DCT/WHT coefficients) of the residue signal.

Please refer to FIG. 1, which is a diagram illustrating multiplepartitions in a compressed frame to be processed by a proposed bufferingapparatus of the present invention. The compressed frame 100 istransmitted via a VP8/WebP bitstream, and therefore contains Npartitions 102_1-102_N which are sequentially transmitted. That is, thepartition 102_1 is the leading partition of the compressed frame 100,and the partition 102_N is the last partition of the compressed frame100. The partition 102_1 includes header information applied to thewhole frame 100, and also includes the prediction information for eachMB in the same frame 100. Regarding each of the remaining partitions102_2-102_N following the partition 102_1, it includes transformingcoefficients of the residue, such as DCT coefficients or WHTcoefficients. When there is more than one partition for the transformcoefficients, the sizes of the partitions—except the last partition—inbytes are also present in the bitstream right after the above-mentionedleading partition 102_1. Each of the partition sizes is recorded by a3-byte data item. For example, a 3-byte partition size PS₂ shown in FIG.1 indicates the size of the partition 102_2, and a 3-byte partition sizePS₃ shown in FIG. 1 indicates the size of the partition 102_3. Thesepartition sizes provide the decoding apparatus direct access to allDCT/WHT coefficient partitions, which may enable parallel processing ofthe coefficients in a decoding apparatus.

However, VP8/WebP packs the DCT/WHT coefficients from macroblock (MB)rows into separate partitions. Please refer to FIG. 2, which is adiagram illustrating how transform coefficients in a compressed frameare packed into four partitions. As shown in the figure, there are manyMB rows MB_(—)0-MB_(—)15 in the exemplary compressed frame 200. Thetransform coefficients of the MB rows MB_(—)0, MB_(—)4, MB_(—)8, andMB_(—)12 are packed in a partition Partition_(—)1, the transformcoefficients of the MB rows MB_(—)1, MB_(—)5, MB_(—)9, and MB_(—)13 arepacked in a partition Partition_(—)2, the transform coefficients of theMB rows MB_(—)2, MB_(—)6, MB_(—)10, and MB_(—)14 are packed in apartition Partition_3, and the transform coefficients of the MB rowsMB_(—)3, MB_(—)7, MB_(—)11, and MB_(—)15 are packed in a partitionPartition_(—)4. Therefore, as successive MB rows are not packed in thesame partition, decoding of one MB may require data read from differentpartitions. In a case where the employed bitstream buffer does not haveenough storage space for buffering data of the whole compressed frame,certain data requested by the decoder may not be immediately availablein the bitstream buffer. As a result, the bitstream buffer may have torelease the buffered data of one partition and then load the requesteddata in another partition.

However, switching between different partitions would lower the decodingspeed due to the time period needed for loading the requested data.Thus, to improve the efficiency of decoding each compressed frame of amulti-partition VP8/WebP bitstream, the present invention thereforeproposes an innovative buffer maintenance and control mechanism. Furtherdetails are described as below.

FIG. 3 is a diagram illustrating a video/image decoding system accordingto a first exemplary embodiment of the present invention. Thevideo/image decoding system 300 includes a buffering apparatus 302 and adecoding apparatus 304. The buffering apparatus 302 is for buffering amulti-partition video/image bitstream BS_IN which transmits a pluralityof compressed frames each having a plurality of partitions. In thisexemplary embodiment, the buffering apparatus 302 includes a pluralityof bitstream buffers 312_1-312_N, a buffer controller 314, and amultiplexer (MUX) 315. The bitstream buffers 312_1-312_N are arranged tobuffer data of the partitions 102_1-102_N shown in FIG. 1, respectively.The bitstream data is stored into the bitstream buffers 312_1-312_Naccording to write pointers WPTR_(—)1-WPTR_N, and the bitstream data isread from the bitstream buffers 312_1-312_N according to read pointersRPTR_(—)1-RPTR_N. More specifically, the write pointer WPTR_(—)1controls the write address at which the headerinformation/per-macroblock prediction information is stored into thebitstream buffer 312_1, and the read pointer RPTR_(—)1 controls the readaddress at which the buffered header information/per-macroblockprediction information of the partition 102_1 is read from the bitstreambuffer 312_1; the write pointer WPTR_(—)2 controls the write address atwhich the transform coefficient (e.g., a DCT/WHT coefficient) of thepartition 102_2 is stored into the bitstream buffer 312_2, and the readpointer RPTR_(—)2 controls the read address at which the bufferedtransform coefficient is read from the bitstream buffer 312_2; and thewrite pointer WPTR_N controls the write address at which the transformcoefficient (e.g., a DCT/WHT coefficient) of the partition 102_N isstored into the bitstream buffer 312_N, and the read pointer RPTR_Ncontrols the read address at which the buffered transform coefficient isread from the bitstream buffer 312_N.

In this exemplary embodiment, the bitstream buffers 312_1-312_N may becontinuous/discontinuous ring buffers dedicated to buffering data of thepartitions 102_1-102_N, respectively, and data is allowed to be fed intoa ring buffer when the ring buffer has free storage space (i.e., thewrite pointer does not catch up the read pointer yet). In one exemplarydesign, the buffer controller 314 is arranged to monitor the writepointers WPTR_(—)1-WPTR_N and the read pointers RPTR_(—)1-RPTR_N of allbitstream buffers 312_1-312_N at the same time. Therefore, when thebuffer controller 314 detects any bitstream buffer that has free spacefor receiving more data that is not buffered yet, the buffer controller314 adjusts the corresponding write pointer and allows data that is notbuffered yet to be filled into the bitstream buffer.

In another exemplary design, the buffer controller 314 is arranged toonly monitor the write pointer and the read pointer of a currently usedbitstream buffer in which the buffered data is being decoded. Therefore,when the buffer controller 314 detects that the currently used bitstreambuffer has free space for receiving more data that is not buffered yet,the buffer controller 314 adjusts the corresponding write pointer andallows data that is not buffered yet to be filled into the currentlyused bitstream buffer.

In above-mentioned exemplary designs, a read pointer and a write pointerof a specific bitstream buffer are used to determine/detect whether thespecific bitstream buffer is full or empty or to determine/detect howmuch free storage space remained in the specific bitstream buffer.However, this is for illustrative purposes only, and is not meant to bea limitation of the present invention. Using other means capable ofdetermining/detecting whether the specific bitstream buffer is full orempty or to determining/detecting how much free storage space remainedin the specific bitstream buffer is also feasible.

Regarding the decoding apparatus 304, it includes a plurality ofbitstream direct memory access (DMA) controllers 316, 317, and aplurality of barrel shifters 318, 319. The bitstream DMA controller 316is arranged to transmit buffered bitstream data (i.e., headerinformation/per-macroblock prediction information) from the bitstreambuffer 312_1 to the barrel shifter 318 via DMA manner, and the barrelshifter 318 is arranged to parse the bitstream data provided by thepreceding bitstream DMA controller 316. The bitstream DMA controller 316is arranged to transmit buffered bitstream data (i.e., coefficient data)from one of the bitstream buffers 312_2-312_N to the barrel shifter 319via DMA manner, and the barrel shifter 319 is arranged to parse thebitstream data provided by the preceding bitstream DMA controller 317.Therefore, the decoding apparatus 304 shown in FIG. 3 is capable ofdecoding two partitions simultaneously.

As only one of the coefficient partitions (i.e., partitions 102_2-102_N)is allowed to be decoded by the decoding apparatus 302, the bufferingapparatus 302 therefore uses the multiplexer 315 to select one of thebitstream buffers 312_2-312_N as a data source to be accessed by thebitstream DMA controller 317. For example, when the coefficient data ofthe partition 1022 is required to be processed at a first time point,the multiplexer 315 couples the bitstream buffer 312_2 to the bitstreamDMA controller 317. However, when the coefficient data of the partition102_3 is required to be processed at a second time point, themultiplexer 315 couples the bitstream buffer 312_3 to the bitstream DMAcontroller 317. As the requested coefficient data may be guaranteed tobe available in the bitstream buffers (e.g., ring buffers) 312_2-312_Nif each of the bitstream buffer 312_2-312_N is properly controlled tobuffer data to be decoded when there is free storage space, thebuffering apparatus 302 is not required to release buffered data of onepartition and load requested data in another partition. To put itanother way, the decoding performance may be greatly improved due to thebuffering mechanism which employs multiple bitstream buffers dedicatedto buffering partial data of respective partitions, thus avoidingfrequent releasing of buffered data and loading of requested data.

Please note that the circuit configuration shown in FIG. 3 merely servesas one exemplary embodiment of the present invention. Any alternativedesign that does not depart from the spirit of the present inventionfalls within the scope of the present invention. For example, the spiritof the present invention is obeyed as long as the buffering apparatusincludes multiple bitstream buffers arranged to buffer data of differentpartitions in the same compressed frame, respectively. For example, inone alternative design, the buffering apparatus 302 is modified toinclude the bitstream buffer 312_1 used for buffering bitstream data ofthe partition 102_1, at least one of the bitstream buffers 312_2-312_Nused for buffering at least one of the partitions 102_2-102_N, and asingle bitstream buffer used for buffering bitstream data of the rest ofthe partitions 102_2-102_N. In another alternative design, the bufferingapparatus 302 is modified to include at least two of the bitstreambuffers 312_2-312_N used for buffering at least two of the partitions102_2-102_N, and a single bitstream buffer used for buffering bitstreamdata of the partition 102_1 and bitstream data of the rest of thepartitions 102_2-102_N. The objective of improving the decodingperformance of the decoding apparatus 304 is also achieved.

The decoding performance of the decoding apparatus 304 may be furtherimproved by utilizing a buffering apparatus with a prefetch mechanismemployed therein. Please refer to FIG. 4, which is a diagramillustrating a video/image decoding system according to a secondexemplary embodiment of the present invention. The major differencebetween the video/image decoding systems 300 and 400 is that thebuffering apparatus 402 shown in FIG. 4 has a prefetch circuit 404included therein. In this exemplary embodiment, the prefetch circuit 404includes a prefetch unit 406 and a storage unit 408. The prefetch unit406 is arranged to prefetch data from a bitstream buffer in which thecoefficient data of a next partition to be processed is stored and storethe prefetched data into the storage unit 404 while the decodingapparatus 304 is decoding a current partition, wherein the prefetcheddata stored in the prefetch unit 408 is read by the decoding apparatus304 when the decoding apparatus 304 starts decoding the next partition.The storage unit 408 may be an internal buffer of the decoding apparatus304. Thus, a data access speed of the storage unit 408 could be fasterthan a data access speed of each of the bitstream buffers 312_2-312_N.For example, the storage unit 408 may be implemented by a register or astatic random access memory (SRAM). When the decoding apparatus 304switches to decoding of the next partition, the time period needed forfetching the coefficient data of the next partition from one of thebitstream buffers 312_2-312_N can be saved/reduced due to the prefetcheddata available in the storage unit 408. In other words, the time periodneeded for fetching the coefficient data of the next partition iscovered in the time period during which the current partition isdecoded. Thus, the use of the prefetch circuit 404 is capable ofspeeding up the overall decoding process.

In the exemplary embodiment shown in FIG. 4, the prefetch mechanism isemployed for prefetching next partition's data to be decoded by thefollowing decoding apparatus. However, the same conception may beapplied to prefetching next partition's data to be buffered by one ofthe bitstream buffers. Please refer to FIG. 5, which is a diagramillustrating an alternative design of the buffering apparatus 302 shownin FIG. 3. The buffering apparatus 502 includes a prefetch circuit 504and the aforementioned bitstream buffers 312_1-312_N and multiplexer315. The prefetch circuit 504 is arranged to concurrently monitor one ofthe bitstreams 312_1-312_N that is buffering coefficient data of acurrent partition and one or more of the bitstreams 312_1-312_N that areused for buffering coefficient data of next partitions, and requestsmore data from a previous stage (e.g., Internet, middleware, or disk)when one or more of the bitstreams that are used for bufferingcoefficient data of next partitions have free storage space availablefor buffering prefetched data. To put it simply, the prefetch circuit504 is arrange to prefetch data and store the prefetched data into atleast a next partition bitstream buffer while a current partitionbitstream buffer is buffering the coefficient data of the currentpartition processed by the following decoding apparatus. Therefore, withthe help of the implemented prefetch mechanism, the bitstream bufferingefficiency of the buffering apparatus is improved.

In above exemplary embodiments, the buffering apparatus with theprefetch mechanism employed therein has N bitstream buffers dedicated tobuffering data of respective partitions, where N may any positiveinteger greater than 1. However, the proposed prefetch mechanism mayalso be employed in a buffering apparatus with a single bitstream bufferused for buffering data of a plurality of partitions.

Please refer to FIG. 6, which is a diagram illustrating a video/imagedecoding system according to a third exemplary embodiment of the presentinvention. The video/image decoding system 600 includes a bufferingapparatus 602 and a decoding apparatus 604, wherein the bufferingapparatus 602 includes a single bitstream buffer 612 and theaforementioned prefetch circuit 404, and the decoding apparatus 604includes a single bitstream DMA controller 616 and a single barrelshifter 618. In this exemplary embodiment, the single bitstream buffer612 is not a ring buffer. Besides, the bitstream size of the compressedframe 100 may be large. Thus, in a case where the buffer size of thesingle bitstream buffer 612 is smaller than the bitstream size of thecompressed frame 100, the single bitstream buffer 612 only bufferspartial data of the compressed frame 100 (i.e., data of a currentpartition and next partition(s) of the compressed frame 100). Though thesingle bitstream buffer 612 may need to switch between partitions forloading requested data from a previous stage (e.g., Internet,middleware, or disk), the use of the prefetch circuit 404 is capable ofimproving the decoding efficiency of the decoding apparatus 604 byimmediately feeding the requested data of the next partition to thedecoding apparatus 604 when decoding of the next partition is started.

Moreover, no matter what the buffer size of the single bitstream buffer612 is (e.g. smaller than/bigger than/equal to the bitstream size of thecompressed frame 100), the use of the prefetch circuit 404 is stillcapable of speeding up the overall decoding process. As a person skilledin the art should readily understand operations of the decodingapparatus 604 and the prefetch circuit 404 after reading aboveparagraphs, further description is omitted here for brevity.

FIG. 7 is a diagram illustrating a video/image decoding system accordingto a fourth exemplary embodiment of the present invention. Thevideo/image decoding system 700 includes the aforementioned bufferingapparatus 602 and decoding apparatus 304. Specifically, the bufferingapparatus 602 includes a single bitstream buffer 612 and a prefetchcircuit 404, and the decoding apparatus 304 includes a plurality ofbitstream DMA controllers 316, 317 and a plurality of barrel shifters318, 319. Compared to the decoding apparatus 604 shown in FIG. 6, thedecoding apparatus 304 shown in FIG. 7 is capable of decoding twopartitions simultaneously. The combination of the bitstream DMAcontroller 316 and barrel shifter 318 is used for processing headerinformation and per-macroblock prediction information contained in thepartition 102_1, and the combination of the bitstream DMA controller 317and barrel shifter 319 is used for processing coefficient data containedin the partitions 102_2-102_N. Similarly, though the single bitstreambuffer 612 may need to switch between partitions for loading requesteddata from a previous stage (e.g., Internet, middleware, or disk), theuse of the prefetch circuit 404 is capable of improving the decodingefficiency of the decoding apparatus 304 by immediately feeding therequested data of the next partition to the decoding apparatus 304 whendecoding of the next partition is started.

Moreover, no matter what the buffer size of the single bitstream buffer612 is (e.g. smaller than/bigger than/equal to the bitstream size of thecompressed frame 100), the use of the prefetch circuit 404 is stillcapable of speeding up the overall decoding process. As a person skilledin the art should readily understand operations of the decodingapparatus 604 and the prefetch circuit 404 after reading aboveparagraphs, further description is omitted here for brevity.

Please note that the above-mentioned exemplary embodiments are directedto buffering and decoding a multi-partition VP8/WebP bitstream. However,this is not meant to be a limitation of the present invention. Theproposed buffering mechanism and/or prefetch mechanism may be employedfor processing any multi-partition based bitstream.

FIG. 8 is a diagram illustrating a video/image decoding system accordingto a fifth exemplary embodiment of the present invention. By way ofexample, the video/image decoding system 800 may be employed to processa multi-tile video/image bitstream BS_IN′ complying with an HEVCspecification or a JPEG-XR specification. As a multi-tile encodedpicture of a JPEG-XR bitstream has a tile configuration similar to thatof a multi-tile encoded picture of an HEVC bitstream, the multi-tileJPEG-XR bitstream may be processed using the proposed buffering and/ordecoding method applied to the multi-tile HEVC bitstream. Thevideo/image decoding system 800 includes a buffering apparatus 802 and adecoding apparatus 804. The buffering apparatus 802 is for buffering themulti-tile video/image bitstream BS_IN which transmits a plurality ofcompressed/encoded frames PIC_IN each having a plurality of tiles. Inthis exemplary embodiment, the buffering apparatus 802 includes astorage device 812, a buffer controller 814, and a pointer buffer 816,where the storage device 811 may include one or more bitstream buffers,depending upon actual design consideration. The multi-tile video/imagebitstream BS_IN is stored into the storage device 812 under the controlof the buffer controller 814. Specifically, the pointer buffer 816 maystore a write pointer and one or more read pointers for each bitstreambuffer implemented in the storage device 811, and the buffer controller814 refers to the write pointer and the read pointer to determinewhether the corresponding bitstream buffer has free storage space foraccommodating data of the multi-tile video/image bitstream BS_IN.

The decoding apparatus 804 is used to decode each multi-tile encodedpicture PIC_IN transmitted via the multi-tile video/image bitstreamBS_IN′. In this embodiment, the decoding apparatus 804 includes abitstream DMA controller 822 and an entropy decoder 824. In addition tocontrolling data buffering of the multi-tile video/image bitstream BS_INin the storage device 812, the buffer controller 814 further outputs aread pointer PTR_C to inform the bitstream. DMA controller 822 of theaccess position of the requested data (e.g., an LCU/TB/MB to be decoded)in the storage device 812. Hence, the bitstream. DMA controller 822refers to the currently used read pointer PTR_C to transfer therequested data from the storage device 812 to the entropy decoder 824for entropy decoding. It should be noted that the read pointer PTR_Cwill be updated each time a requested data (i.e., one requestedLCU/TB/MB) has been read from the storage device 812.

It should be noted that the proposed read pointer maintenance scheme maybe employed by a decoding operation of independent tiles or a decodingoperation of dependent tiles. In the following, an example of decodingindependent tiles is provided for illustrative purposes only, and is notmeant to be a limitation of the present invention.

Please refer to FIG. 9, which is a diagram illustrating an exemplaryentropy decoding operation performed by the entropy decoder 824 shown inFIG. 8. Suppose that a multi-tile encoded picture PIC_IN to be decodedis derived from a multi-tile HEVC bitstream. Hence, the multi-tileencoded picture PIC_IN is partitioned into a plurality of tiles (e.g.,nine dependent tiles T₁₁-T₃₃ in this embodiment). Each of the tilesT₁₁-T₃₃ is composed of a plurality of LCUs/TBs. If a conventionaldecoding manner is employed, the LCU/TB index values shown in FIG. 9indicate the conventional decoding order of the LCUs/TBs included in themulti-tile encoded picture PIC_IN. Specifically, regarding aconventional decoder design, the decoding order in a multi-tile encodedpicture with tiles has a raster scan sequence for LCUs/TBs in each tileand a raster scan sequence for the tiles. To put it another way, theconventional decoding order is identical to a transmission order of theLCUs/TBs included in the multi-tile encoded picture PIC_IN. That is, theLCUs/TBs in the same tile are successively transmitted in a raster scansequence, and the tiles are successively transmitted in a raster scansequence. In contrast to the conventional decoder design, the proposeddecoder design of the present invention has the entropy decoder 824configured to decode all LCUs/TBs of the whole multi-tile encodedpicture PIC_IN in a raster scan manner, where the decoding orderincludes successive decoding sequences S1-S8 as shown in FIG. 9. Forexample, the LCUs/TBs, located at the first row shown in FIG. 9 andbelonging to different tiles T₁₁, T₁₂ and T₁₃, are sequentially decodedfrom the left-most LCU/TB to the right-most LCU/TB as indicated by thedecoding sequence S1; the LCUs/TBs, located at the second row shown inFIG. 9 and belonging to different tiles T₁₁, T₁₂ and T₁₃, aresequentially decoded from the left-most LCU/TB to the right-most LCU/TBas indicated by the decoding sequence S2 following the decoding sequenceS1; and the LCUs/TBs, located at the third row shown in FIG. 9 andbelonging to different tiles T₁₁, T₁₂ and T₁₃, are sequentially decodedfrom the left-most LCU/TB to the right-most LCU/TB as indicated by thedecoding sequence S3 following the decoding sequence S2. In other words,the proposed decoding order employed the entropy decoder 824 isdifferent from the transmission order of the LCUs/TBs included in themulti-tile encoded picture PIC_IN.

In this embodiment, data of the LCUs/TBs is encoded using acontext-based adaptive binary arithmetic coding (CABAC) algorithm.Hence, the context model, which is a probability model, should beproperly selected and updated during the entropy decoding of themulti-tile encoded picture PIC_IN. It should be noted that the entropydecoder 824 is configured to initialize the CABAC statistics at thefirst LCU/TB of each tile. That is, the CABAC statistics at the firstLCU/TB of a current tile may be inherited from the CABAC statistics at aspecific LCU/TB of a previous tile horizontally adjacent to the currenttile, where the first LCU/TB and the specific LCU/TB are horizontallyadjacent to each other and located at opposite sides of a tile boundary(i.e., a vertical/column boundary) between the current tile and theprevious tile. As can be seen from FIG. 9, the initial CABAC statisticsat the first LCU/TB indexed by “13” in the tile T₁₂ is inherited fromthe CABAC statistics updated at the LCU/TB indexed by “4” in the tileT₁₁; similarly, the initial CABAC statistics at the first LCU/TB indexedby “31” in the tile T₁₃ is inherited from the CABAC statistics updatedat the LCU/TB indexed by “18” in the tile T₁₂. The tiles T₁₁-T₁₃ arehorizontally adjacent tiles, i.e., horizontal partitions. However, thetiles T₁₁, T₂₁, and T₃₁ are vertically adjacent tiles, i.e., verticalpartitions. Regarding the tile T₂₁ which is vertically adjacent to thetile T₁₁, the initial CABAC statistics at the first LCU/TB indexed by“40” in the tile T₂₁ would be inherited from the CABAC statisticsupdated at the last LCU/TB indexed by “39” in the tile T₁₃. As theinitial setting of the CABAC statistics for the rest of the tiles can beeasily deduced by analogy, further description is omitted for brevity.

As the entropy decoder 824 employs the decoding order includingsuccessive decoding sequences S1-S8, the LCUs/TBs in the same tile arenot decoded continuously due to the fact that the entropy decoder 824starts decoding a portion of a current tile after decoding a portion ofa previous tile. As can be seen from FIG. 9, after the LCUs/TBs indexedby “1”, “2”, “3” and “4” of the tile T₁₁ are successively decoded, thenext LCU/TB to be decoded by the entropy decoder 824 would be the firstLCU/TB indexed by “13” in the next tile T₁₂ rather than the LCU/TBindexed by “5” in the current tile T₁₁; after the LCUs/TBs indexed by“13”, “14”, “15”, “16”, “17” and “18” of the tile T₁₂ are successivelydecoded, the next LCU/TB to be decoded by the entropy decoder 824 wouldbe the first LCU/TB indexed by “31” in the next tile T₁₃ rather than theLCU/TB indexed by “19” in the current tile T₁₂; and after the LCUs/TBsindexed by “31”, “32” and “33” of the tile T₁₃ are successively decoded,the next LCU/Tb to be decoded by the entropy decoder 824 would be thefirst LCU/TB indexed by “5” in the previously processed tile T₁₁ ratherthan the LCU/TB indexed by “34” in the current tile T₁₃. Though eachtile has a plurality of LCUs/TBs successively transmitted and storedinto the storage device, the LCUs/TBs of the same tile are not decodedcontinuously due to the proposed decoding order shown in FIG. 9. Hence,the buffer controller 814 should be properly designed for offeringdesired read pointer maintenance of the buffering apparatus 802.

Please refer to FIG. 10, which is a diagram illustrating a buffercontroller according to an embodiment of the present invention. Thebuffer controller 814 shown in FIG. 8 may be realized by the buffercontroller 1000 shown in FIG. 10. In this embodiment, the buffercontroller 1000 includes a judging unit 1002, a control unit 1004, and amultiplexer (MUX) 1006. The judging unit 1002 is arranged for judging ifdecoding of a current tile of the multi-tile encoded picture PIC_INencounters a tile boundary (e.g., a right vertical/column boundary) ofthe current tile, and accordingly generating a judgment result JR. Forexample, the judging unit 1002 may actively monitor the entropy decodingoperation performed by the decoding apparatus 804 to judge if the tileboundary is encountered, or may passively receive an entropy decodingstatus provided by the decoding apparatus 804 to judge if the tileboundary is encountered.

The control unit 1004 is arranged for storing a currently used readpointer PTR_C into the pointer buffer 816 and loading a selected readpointer from the pointer buffer 816 to act as the currently used readpointer PTR_C when the judgment result JR indicates that the tileboundary is encountered, where the selected read pointer loaded from thepointer buffer 816 may be a read pointer of a next tile to be decodedimmediately after the current tile. As shown in FIG. 10, the controlunit 1004 generates a selection signal SEL to the MUX 1006 to controlwhich one of the read pointers RP₁, RP₂, RP₃-RP_(N) maintained in thepointer buffer 816 is selected and loaded as the currently used readpointer PTR_C.

By way of example, but not limitation, the number of read pointersmaintained in the pointer buffer 816 during entropy decoding of themulti-tile encoded picture PIC_IN depends on the partitioning setting ofthe multi-tile encoded picture PIC_IN. For example, when the multi-tileencoded picture PIC_IN has N horizontally adjacent partitions (i.e., Nhorizontal partitions/tiles at the same row), the number of readpointers maintained in the pointer buffer 816 during entropy decoding ofthe multi-tile encoded picture is equal to N. Regarding the exampleshown in FIG. 9, N is equal to 3. Hence, there are 3 read pointers(e.g., RP₁-RP₃) concurrently maintained in the pointer buffer 816, whereeach of the read pointers indicates an access position in the storagedevice 812.

The read pointers RP₁-RP_(N) may be initialized by referring to theheader information transmitted via the multi-tile video/image bitstreamBS_IN′. FIG. 11 is a diagram illustrating a sketch map of the multi-tilevideo/image bitstream BS_IN′ according to an embodiment of the presentinvention. The tile size of each tile included in the multi-tile encodedpicture PIC_IN is recorded in the header information section. These tilesizes provide the information needed for calculating the offset (e.g.,an entry point offset) of the n^(th) tile from the start of themulti-tile encoded picture PIC_IN. Thus, when the tiles are sequentiallystored into bitstream buffer(s) of the storage device 812, the storagelocation of the start of each tile can be readily obtained and used forsetting the initial value of a corresponding read pointer in the pointerbuffer 816.

In accordance with the HEVC specification, all slices within a tileshall be complete or all tiles within a slice shall be complete. TheHEVC bitstream structure shown in FIG. 11 is for a slice having aplurality of tiles included therein. However, based on the HEVCspecification, it is possible that one tile may have a plurality ofslices included therein. The aforementioned entry point offset basedinitialization method for the read pointers RP₁-RP_(N) is not applicableto the case where one tile has a plurality of slices included therein.Please refer to FIG. 12 in conjunction with FIG. 13. FIG. 12 is adiagram illustrating a plurality of tiles each including a plurality ofslices according to an embodiment of the present invention. FIG. 13 is adiagram illustrating another sketch map of the multi-tile video/imagebitstream BS_IN′ according to an embodiment of the present invention. Asshown in FIG. 12, one tile Tile_(—)0 includes a plurality of slicesSlice_(—)0 and Slice_(—)1, and another tile Tile_(—)1 includes aplurality of slices Slice_(—)2 and Slice_(—)3. As shown in FIG. 13, theslices Slice_(—)0-Slice_(—)3 are sequentially transmitted and storedinto bitstream buffer (s) of the storage device 812. Regarding the casewhere one tile has a plurality of slices included therein, the presentinvention proposes initializing the read pointers RP₁-RP_(N) byreferring to the slice addresses. Thus, when the slices are sequentiallystored into bitstream buffer (s) of the storage device 812, the storagelocation of the start of each tile can be readily obtained from theslice address of the first slice included in the tile. For example, theslice address of the slice Slice_(—)2 of the tile Tile 1 can be used forsetting the initial value of a corresponding read pointer in the pointerbuffer 816. The same objective of initializing a read pointer of eachtile is achieved.

An exemplary read pointer maintenance operation of the bufferingapparatus 802 is described with reference to FIG. 14. Supposing that themulti-tile encoded picture PIC_IN has the partition setting shown inFIG. 9, the number of maintained read pointers is equal to 3 (i.e.,N=3). In the beginning, the read pointer RP₁ with an initial value isloaded via the MUX 1006 to act as the currently used read pointer PTR_Creferenced by the bitstream DMA controller 822 for reading the LCU/TBindexed by “1” from the storage device 812. When the entropy decoding ofthe tile T₁₁ encounters a tile boundary (e.g., a vertical/columnboundary VB₁) after decoding the LCU/TB indexed by “18”, the currentlyread pointer PTR_C pointing to an access location of the subsequentLCU/TB indexed by “5” is stored into the pointer buffer 816 to updatethe read pointer RP₁ maintained in the pointer buffer 816, and the readpointer RP₂ with an initial value is loaded via the MUX 1006 to act asthe currently used read pointer PTR_C referenced by the bitstream DMAcontroller 822 for reading the LCU/TB indexed by “13” from the storagedevice 812. When the entropy decoding of the tile T₁₂ encounters a tileboundary (e.g., a vertical/column boundary VB₂) after decoding theLCU/TB indexed by “18”, the currently read pointer PTR_C pointing to anaccess location of the subsequent LCU/TB indexed by “19” is stored intothe pointer buffer 816 to update the read pointer RP₂ maintained in thepointer buffer 816, and the read pointer RP₃ is loaded via the MUX 1006to act as the currently used read pointer PTR_C referenced by thebitstream DMA controller 822 for reading the LCU/TB indexed by “31” fromthe storage device 812. When the entropy decoding of the tile T₁₃encounters a tile boundary (e.g., a vertical/column boundary VB₃) afterdecoding the LCU/TB indexed by “33”, the currently read pointer PTR_Cpointing to an access location of the subsequent LCU/TB indexed by “34”is stored into the pointer buffer 816 to update the read pointer RP₃maintained in the pointer buffer 816, and the read pointer RP₁ is loadedvia the MUX 1006 to act as the currently used read pointer PTR_Creferenced by the bitstream DMA controller 822 for reading the LCU/TBindexed by “5” from the storage device 812. As a person skilled in theart can readily understand loading and storing of the read pointerreferenced for reading the following requested LCUs/TBs by referring toFIG. 9, further description is omitted here for brevity.

The storage device 812 may be implemented using a single bitstreambuffer or multiple bitstream buffers. In a case where the storage device812 is implemented using multiple bitstream buffers, the buffer size canbe saved. For example, the multiple bitstream buffers arecontinuous/discontinuous ring buffers dedicated to buffering LCU/TB/MBdata of different tiles, respectively, and the LCU/TB/MB data is allowedto be fed into a ring buffer when the ring buffer has free storage space(i.e., a write pointer of the ring buffer does not catch up a readpointer of the ring buffer yet).

FIG. 15 is a diagram illustrating a storage device according to a firstembodiment of the present invention. The storage device 812 shown inFIG. 8 may be realized by the storage device 1300 shown in FIG. 15. Inthis embodiment, the storage device 1300 includes a plurality ofbitstream buffers 1302_1, 1302_2, 1302_3-1302_N and a multiplexer (MUX)1304, wherein the distinct bitstream buffers 1302_1-1302_N provide aplurality of distinct storage spaces for data buffering, respectively.By way of example, but not limitation, the number of bitstream buffers(i.e., storage spaces) implemented in the storage device 1300 depends onthe partitioning setting of the multi-tile encoded picture PIC_IN. Forexample, when the multi-tile encoded picture PIC_IN has N horizontallyadjacent partitions (i.e., N horizontal partitions/tiles at the samerow), the number of bitstream buffers implemented in the storage device1300 is equal to N. Regarding the example shown in FIG. 9, N is equal to3. Hence, there are three bitstream buffers (e.g., 1302_1-1302_3) usedfor buffering LCU/TB/MB data of three tiles (e.g., T₁₁-T₁₃, T₂₁-T₂₃, orT₃₁-T₃₃), respectively. The bitstream buffers 1302_1-1302_N may be ringbuffers.

Besides, the bitstream data is stored into the bitstream buffers1302_1-1302_N according to write pointers WPTR_(—)1-WPTR_N stored in thepointer buffer 816 and controlled/updated by the buffer controller 814,and the bitstream data is read from the bitstream buffers 1302_1-1302_Naccording to read pointers RPTR_(—)1-RPTR_N stored in the pointer buffer816 and controlled/updated by the buffer controller 814. Morespecifically, in a case where 1^(st)-N^(th) tiles are horizontallyadjacent tiles at the same row, the write pointer WPTR_(—)1 controls thewrite address at which LCU/MB data of the 1^(st) tile is stored into thebitstream buffer 1302_1, and the read pointer RPTR_(—)1 controls theread address at which the buffered LCU/MB data of the 1^(st) tile isread from the bitstream buffer 1302_1; the write pointer WPTR_(—)2controls the write address at which the LCU/MB data of the 2^(nd) tileis stored into the bitstream buffer 1302_2, and the read pointerRPTR_(—)2 controls the read address at which the buffered LCU/MB data ofthe 2^(nd) tile is read from the bitstream buffer 1302_2; the writepointer WPTR_(—)3 controls the write address at which the LCU/MB data ofthe 3^(rd) tile is stored into the bitstream buffer 1302_3, and the readpointer RPTR_(—)3 controls the read address at which the buffered LCU/MBdata of the 3^(rd) tile is read from the bitstream buffer 1302_3; andthe write pointer WPTR_N controls the write address at which the LCU/MBof the N^(th) tile is stored into the bitstream buffer 1302_N, and theread pointer RPTR_N controls the read address at which the bufferedLCU/MB data of the N^(th) tile is read from the bitstream buffer 1302_N.

The buffer controller 814 further generates a selection signal SEL′ tothe MUX 1304 to select one of the bitstream buffers 1302_1-1302_N as adata source to be accessed by the bitstream DMA controller 822. Forexample, when the LCU/MB data of the 1^(st) tile is required to beprocessed by the entropy decoder 824, the MUX 1304 couples the bitstreambuffer 1302_1 to the bitstream DMA controller 822. Besides, the buffercontroller 814 sets the currently used read pointer PTR_C by the readpointer RPTR_(—)1 of the selected bitstream buffer 1302_1. However, whenthe LCU/MB data of the 2^(nd) tile is required to be processed by theentropy decoder 824, the MUX 1304 couples the bitstream buffer 1302_2 tothe bitstream DMA controller 822. Besides, the buffer controller 814sets the currently used read pointer PTR_C by the read pointer RPTR_(—)2of the selected bitstream buffer 1302_2. In other words, when the LCU/MBdata of a currently decoded tile is retrieved by the bitstream DMAcontroller 822, the LCU/MB data of other tiles that are not currentlydecoded is buffered in other bitstream buffers. As the requested LCU/MBdata may be guaranteed to be available in the bitstream buffers (e.g.,ring buffers) 1302_1-1302_N if each of the bitstream buffer1302_1-1302_N is properly controlled to buffer data to be decoded whenthere is free storage space, the buffering apparatus 1300 is notrequired to release buffered data of one tile and load requested data ofanother tile. In this way, the decoding performance may be greatlyimproved due to the buffering mechanism which employs multiple bitstreambuffers dedicated to buffering partial data of respective tiles, thusavoiding frequent releasing of buffered data and loading of requesteddata.

Please note that the circuit configuration shown in FIG. 15 merelyserves as one exemplary embodiment of the present invention. Anyalternative design that does not depart from the spirit of the presentinvention also falls within the scope of the present invention. Forexample, the spirit of the present invention is obeyed as long as thebuffering apparatus includes multiple bitstream buffers arranged tobuffer data of different tiles in the same multi-tile encoded picture,respectively. For example, in one alternative design, the bufferingapparatus 1300 may be modified to include bitstream buffers respectivelyused for buffering LCU/MB data of some of the tiles in a multi-tileencoded picture, and a single bitstream buffer used for buffering therest of the tiles in the multi-tile encoded picture. The same objectiveof improving the decoding performance of the decoding apparatus is alsoachieved.

The decoding performance of the decoding apparatus 804 may be furtherimproved by utilizing a buffering apparatus with a prefetch mechanismemployed therein. Please refer to FIG. 16, which is a diagramillustrating a storage device according to a second embodiment of thepresent invention. The storage device 812 shown in FIG. 8 may berealized by the storage device 1400 shown in FIG. 16. The majordifference between the storage devices 1300 and 1400 is that thebuffering apparatus 1400 has a prefetch circuit 1401 included therein.In this exemplary embodiment, the prefetch circuit 1401 includes aprefetch unit 1402 and a storage unit 1404. The prefetch unit 1402 isarranged to prefetch data from a bitstream buffer in which the LCU/TB/MBdata of a next tile to be processed is stored and store the prefetcheddata into the storage unit 1404 while the decoding apparatus 804 isdecoding a current tile, wherein the prefetched data stored in thestorage unit 1404 is read by the decoding apparatus 804 when thedecoding apparatus 804 starts decoding the next tile. By way of example,the storage unit 1404 may be an internal buffer of the decodingapparatus 804. Thus, a data access speed of the storage unit 1404 couldbe faster than a data access speed of each of the bitstream buffers1302_1-1302_N. For example, the storage unit 408 may be implemented by aregister or a static random access memory (SRAM). When the decodingapparatus 804 switches to decoding of the next tile, the time periodneeded for fetching the LCU/MB data of the next tile from one of thebitstream buffers 1302_1-1302_N can be saved/reduced due to theprefetched data available in the storage unit 1404. In other words, thetime period needed for fetching the LCU/MB data of the next tile isconcealed in the time period during which the current tile is decoded.Thus, the use of the prefetch circuit 1401 is capable of speeding up theoverall decoding process.

In the exemplary embodiment shown in FIG. 16, the prefetch mechanism isemployed for prefetching next tile's data to be decoded by the decodingapparatus. However, the same conception may be applied to prefetchingnext tile's data to be buffered into one of the bitstream buffers.Please refer to FIG. 17, which is a diagram illustrating a storagedevice according to a third embodiment of the present invention. Thestorage device 812 shown in FIG. 8 may be realized by the storage device1500 shown in FIG. 17. The storage device 1500 includes a prefetchcircuit 1502, and the aforementioned bitstream buffers 1302_1-1302_N andmultiplexer 1304. The prefetch circuit 1502 is arranged to concurrentlymonitor one of the bitstreams 1302_1-1302_N that is buffering LCU/MBdata of a tile which is currently decoded and one or more of thebitstreams 1302_1-1302_N that are used for buffering LCU/MB data oftiles which are not currently decoded, and requests more data from aprevious stage (e.g., Internet, middleware, or disk) when thebitstreams, which are used for buffering LCU/MB data of tiles that arenot currently decoded, have free storage space available for bufferingprefetched data. To put it simply, the prefetch circuit 1502 is arrangeto prefetch data and store the prefetched data into at least a next tilebitstream buffer while a current tile bitstream buffer is buffering theLCU/MB data of the current tile processed by the decoding apparatus 804.Therefore, with the help of the implemented prefetch mechanism disposedbefore the bitstream buffers, the bitstream buffering efficiency of thebuffering apparatus is improved.

Regarding above exemplary implementations of the storage device 812shown in FIG. 8, the storage device 1300/1400/1500 in FIG. 15/FIG.16/FIG. 17 is implemented using a plurality of bitstream buffers, suchas continuous/discontinuous ring buffers, to save the buffer size.However, this is not meant to be a limitation of the present invention.Alternatively, the storage device 812 may be implemented using a singlebitstream buffer. Please refer to FIG. 18, which is a diagramillustrating a storage device according to a fourth embodiment of thepresent invention. The combination of multiple bitstream buffers and onemultiplexer shown in FIG. 15/FIG. 16/FIG. 17 may be replaced with thesingle bitstream buffer 1602 of the storage device 1600. The singlebitstream buffer 1602 has a plurality of distinct buffer sections1604_1, 1604_2, 1603-1604_N each providing a storage space for databuffering. One write pointer WPTR controls the write address at whichthe LCU/MB data of the 1^(st)-N^(th) tiles is stored into the singlebitstream buffer 1602, and each of the read pointers RPTR_(—)1-RPTR_Ncontrols the read address at which the buffered LCU/MB data of acorresponding tile is read from one buffer section of the bitstreambuffer 1602. Initially, each of the read pointers RPTR_(—)1-RPTR_Nindicates a start point of a corresponding tile in the bitstream buffer1602. After decoding of a tile is started, a corresponding read pointerwill be properly updated to indicate the read address of the bufferedLCU/TB data to be decoded.

By way of example, but not limitation, the number of buffer sections(i.e., storage spaces) allocated in the single bitstream buffer 1602depends on the partitioning setting of the multi-tile encoded picturePIC_IN. For example, when the multi-tile encoded picture PIC_IN has Nhorizontally adjacent partitions (i.e., N horizontal partitions/tiles atthe same row), the number of buffer sections allocated in the singlebitstream buffer 1602 is equal to N. Regarding the example shown in FIG.9, N is equal to 3. Hence, there are three buffer sections (e.g.,1604_1-1604_3) used for buffering LCU/TB/MB data of three tiles (e.g.,T₁₁-T₁₃, T₂₁-T₂₃, or T₃₁-T₃₃), respectively.

Please note that the above-mentioned exemplary embodiments are directedto buffering and decoding a multi-tile HEVC/JPEG-XR bitstream. However,this is not meant to be a limitation of the present invention. Theproposed buffering mechanism and/or prefetch mechanism may be employedfor processing any multi-tile based bitstream.

Those skilled in the art will readily observe that numerousmodifications and alterations of the device and method may be made whileretaining the teachings of the invention. Accordingly, the abovedisclosure should be construed as limited only by the metes and boundsof the appended claims.

What is claimed is:
 1. A method for read pointer maintenance of abuffering apparatus which is arranged to buffer data of a multi-tileencoded picture having a plurality of tiles included therein, the methodcomprising: judging if decoding of a first tile of the multi-tileencoded picture encounters a tile boundary of the first tile; and whenit is judged that the tile boundary of the first tile is encountered,loading a selected read pointer from a pointer buffer to act as acurrently used read pointer; wherein the currently used read pointer isindicative of an access position of a requested data to be decoded; whenthe selected read pointer is loaded from the pointer buffer, thecurrently used read pointer is changed from a read pointer of the firsttile to the selected read pointer; and the currently used read pointeris not changed from the read pointer of the first tile to the selectedread pointer until it is judged that the tile boundary of the first tileis encountered.
 2. The method of claim 1, wherein the selected readpointer is a read pointer of a second tile to be decoded immediatelyafter the first tile.
 3. The method of claim 1, wherein the tileboundary is a right vertical boundary.
 4. The method of claim 1, whereinthe whole multi-tile encoded picture is decoded in a raster scan order,and decoding of a portion of a second tile starts after decoding of aportion of the first tile is completed.
 5. The method of claim 1,wherein the multi-tile encoded picture complies with a High-EfficiencyVideo Coding (HEVC) specification or a Joint Photographic Experts Groupextended range (JPEG-XR) specification.
 6. The method of claim 1,wherein the multi-tile encoded picture has N horizontally adjacentpartitions, a number of read pointers maintained in the pointer bufferis equal to N, and N is a positive integer.
 7. The method of claim 1,wherein when it is judged that the tile boundary of the first tile isencountered, the read pointer of the first tile acting as a previouscurrently used read pointer is stored into the pointer buffer.
 8. Abuffer controller for read pointer maintenance of a buffering apparatuswhich is arranged to buffer data of at least a multi-tile encodedpicture having a plurality of tiles included therein, the buffercontroller comprising: a judging unit, arranged for judging if decodingof a first tile of the multi-tile encoded picture encounters a tileboundary of the first tile; and a control unit, arranged for loading aselected read pointer from a pointer buffer to act as a currently usedread pointer when the judging unit judges that the tile boundary isencountered; wherein the currently used read pointer is indicative of anaccess position of a requested data to be decoded; when the selectedread pointer is loaded from the pointer buffer, the currently used readpointer is changed from a read pointer of the first tile to the selectedread pointer; and the currently used read pointer is not changed fromthe read pointer of the first tile to the selected read pointer until itis judged that the tile boundary of the first tile is encountered. 9.The buffer controller of claim 8, wherein the selected read pointer is aread pointer of a second tile to be decoded immediately after the firsttile.
 10. The buffer controller of claim 8, wherein the tile boundary isa right vertical boundary.
 11. The buffer controller of claim 8, whereinthe whole multi-tile encoded picture is decoded in a raster scan order,and decoding of a portion of a next tile starts after decoding of aportion of a current tile is completed.
 12. The buffer controller ofclaim 8, wherein the multi-tile encoded picture complies with aHigh-Efficiency Video Coding (HEVC) specification or a JointPhotographic Experts Group extended range (JPEG-XR) specification. 13.The buffer controller of claim 8, wherein the multi-tile encoded picturehas N horizontally adjacent partitions, a number of read pointersmaintained in the pointer buffer is equal to N, and N is a positiveinteger.
 14. The buffer controller of claim 8, wherein when the judgingunit judges that the tile boundary of the first tile is encountered, thecontrol unit is further arranged for storing the read pointer of thefirst tile acting as a previous currently used read pointer into thepointer buffer.
 15. A buffering apparatus for buffering data of at leasta multi-tile encoded picture having a plurality of tiles includedtherein, the buffering apparatus comprising: a first storage space,arranged to buffer a first tile of the multi-tile encoded picture; and asecond storage space, arranged to buffer a second tile of the multi-tileencoded picture; wherein the first tile is currently decoded, the secondtile is not currently decoded, and an output of the second storage spaceis not fed into the first storage space.
 16. The buffering apparatus ofclaim 15, wherein the first storage space and the second storage spaceare provided by a plurality of ring buffers dedicated to buffering dataof the first tile and data of the second tile, respectively.
 17. Thebuffering apparatus of claim 15, wherein the tiles of the multi-tileencoded picture are transmitted sequentially, and the bufferingapparatus further comprises: a prefetch circuit, arranged to prefetchdata of the second tile and store prefetched data into the secondstorage space while the first storage space is receiving and bufferingdata of the first tile.
 18. The buffering apparatus of claim 15, furthercomprising: a prefetch circuit, comprising: a storage unit; and aprefetch unit, arranged to prefetch data of the second tile from thesecond storage space and store prefetched data into the storage unitwhile a decoding apparatus is decoding data of the first tile, whereinthe prefetched data stored in the prefetch unit is read by the decodingapparatus when the decoding apparatus is operative to start decoding thedata of the second tile.
 19. The buffering apparatus of claim 15,wherein the multi-tile encoded picture complies with a High-EfficiencyVideo Coding (HEVC) specification or a Joint Photographic Experts Groupextended range (JPEG-XR) specification.
 20. The buffering apparatus ofclaim 15, wherein the data of the multi-tile encoded picture is decodedin a raster scan order, and decoding of a portion of the second tilestarts after decoding of a portion of the first tile is completed. 21.The buffering apparatus of claim 15, wherein the multi-tile encodedpicture has N horizontally adjacent partitions, a number of storagespaces implemented in the buffering apparatus is equal to N, and N is apositive integer.